Information-Preserving Markov Aggregation 

Bemhard C. Geiger*, Christoph Temmel^^ 

* Signal Processing and Speech Communication Laboratory, Graz University of Technology, Austria 

'Department of Mathematics, VU University - Faculty of Sciences, Amsterdam, The Netherlands 

geiger@ieee.org, ctc@temmel.me 



cn 

o 

Oh 
< 



C/2 



> 

o 

o 

^' 

o 
en 



% 



Abstract — We present a sufficient condition for a non-injective 
function of a Marltov chain to be a second-order Markov chain 
with the same entropy rate as the original chain. This permits an 
information-preserving state space reduction by merging states 
or, equivalently, lossless compression of a Markov source on 
a sample-by-sample basis. The cardinality of the reduced state 
space is bounded from below by the node degrees of the transition 
graph associated with the original Markov chain. 

We also present an algorithm listing all possible information- 
preserving state space reductions, for a given transition graph. 
We illustrate our results by applying the algorithm to a bi-gram 
letter model of an English text. 

Index Terms — lossless compression, Markov chain, model or- 
• der reduction, n-gram model 

I. Introduction 

Markov chains are ubiquitously used in many scientific 
fields, ranging from machine learning and systems biology 
over speech processing to information theory, where they act 
as models for sources and channels. In some of these fields, 
however, the state space of the Markov chain is too large 
to allow either proper training of the model (see rt-grams 
in speech processing UJ) or its simulation (as in chemical 
reaction networks [|2l). 

One way to reduce the cardinality of the state space of a 

Markov chain is to merge states, which is equivalent to feeding 

, the process through a non-injective function. The merging 

usually depends on the cost function; candidate methods either 

.rely on the Fiedler vector or other spectral criteria Ql, IM, or 

' on the KuUback-Leibler divergence rate w.rt. some reference 

process [31, f6l. 

In addition to the model information lost by merging, the 
, obtained process does, in general, not possess the Markov 
■ property. Modeling it as a Markov chain on the reduced state 
space as suggested in, e.g., 15], typically leads to an additional 
loss of model information. The same holds for Markov models 
obtained from clustered training data, as, e.g., for the n- 
gram class model in |7|. Consequently, there is a trade-off 
between cardinality of the state space, model complexity, and 
information loss. 

Recently, we have shown the existence of sufficient condi- 
tions on a Markov chain and a non-injective function merging 
its states such that the obtained process is not only a fcth-order 
Markov chain (which is desirable from a computational point- 
of-view), but also preserves full model information JS]. While 
the former property is commonly referred to as lumpability, 
the latter is a rather surprising one: whereas, in principle, 
stationary sources can be compressed efficiently by assigning 



codewords to blocks of samples, our result shows that in some 
cases lossless compression is possible on a sample-by-sample 
basis. 

Extending our previous results, we show in Section |III] 
using spectral theory of graphs, that for an information- 
preserving compression, the number of input sequences 
merged to the same output sequence is bounded independently 
of the sequence length. This result allows us to estimate the 
minimum cardinality of the reduced state space based on the 
degree structure of the transition graph of the original Markov 
chain. Furthermore, we prove that, if a specific partition of 
the original state space satisfies the sufficient conditions for 
/cth-order Markovity and information-preservation, then so 
does every refinement of this partition. Section |IV] focuses 
on second-order Markov chains, due to their computationally 
desirable properties, and presents an iterative algorithm listing 
all possible partitions satisfying the abovementioned sufficient 
conditions. To illustrate the algorithm, we introduce a simple 
toy example in Section [V] before analyzing a bi-gram letter 
model in Section IVll 

II. Preliminaries & Notation 

Throughout this work, we deal with an irreducible, aperi- 
odic, homogeneous Markov chain X on a finite state space X 
and with transition matrix P. Let X„ be the nth sample of the 
process, and let Xl := {Xi, X^+i, . . . , X^}. We assume that 
X is stationary, i.e., that the initial distribution of the chain 
coincides with its invariant distribution /x. Hence, for every n, 
the distribution Px„ of X„ equals /x. 

We consider a suijective lumping function g: X ^ y, with 
card(A:') —: N > M :— card(3^) > 2. Abusing notation, we 
extend g to A"" — > 3^" coordinate-wise and denote by 5~^[y] 
the preimage of y under g. We call the stationary stochastic 
process Y, defined by y„ : ' = g{Xn), the lumped process 
and the tuple (P,g) the lumping. 

Since the lumping function is non-injective, a loss of 
information may occur, which we quantify by the conditional 
entropy rate 

i7(X|Y) := lim -H{X'l\Y{')=H{X)-H{Y) (1) 

n— >oo Jl 

where H{-) and H{-) denote the entropy and the entropy rate 
(if it exists) of the argument, respectively. The lumping (P, g) 
is information-preserving iff _ff(X|Y) = 0. 



III. Previous Results & Extensions 

We summarize several definitions and results from |j8J 
relevant to this work: 

Definition 1 (Preimage Count). The preimage count of length 
n is the random variable 



T„ 



J2 [Pr(^r = x) > 0] 



(2) 



where [A] 
bracket). 



1 if yl is true and zero otherwise (Iverson 



In other words, the preimage count maps each sequence of 
length n of the output process Y to the cardinality of the 
realizable portion of its preimage. 

The following characterization holds lH] Thm. 1]: 

7I(X| Y) = ^ 3 C < oo: Pr( sup T„ < C) == 1 (3a) 

n— >oo 

i7(X|Y) > ^ 3C > 1: Pr(liminf ^^ > C) = 1 (3b) 

i.e., that an almost-surely bounded preimage count (for arbi- 
trary sequence length n) is equivalent to a vanishing informa- 
tion loss rate. 

The information-preserving case (l3at can be strengthened 
to a deterministic version: 



Proposition 1 (Bounded Preimage Count). 

77(X|Y) = 0^3C < oo: sup r„ < C . 



(4) 



Proof: See Appendix. ■ 

An interesting line for future research would be to show a 
deterministic analog of Obt and its direct derivation from the 
Shaimon-McMillan-Breiman theorem ||9l Ch. 16.8]. 
As a corollary to Proposition [T] we get 

Corollary 1. An information-preserving lumping (P,g) sat- 
isfies 

M > min d, (5) 

i 

where di :— X])=i [Pi-j -^ 0] '■' ^'^^ out-degree of state i. 

Proof: See Appendix. ■ 

Corollary [T] upper-bounds the possible state space reduction 
of an information-preserving lumping. In particular, a Markov 
chain with a positive transition matrix P does not admit an 
information-preserving lumping |8, Cor 4]: In this case, all 
states have out-degree N, and the bound M > N only holds 
for the trivial lumping. 

Complementing this necessary condition for preservation 
of information, in |8 Prop. 10] we also gave a sufficient 
condition, additionally implying that Y is a fcth-order Markov 
chain, i.e., that Vn : Va; e X, x^ S X" : 



Pr(X„ = x\X^-^ - x'l) = PiiX„ 



To this end, we introduced 



x|X:-=<-) 



(6) 



Definition 2 (Single Forward Sequence |,8j Def. 9]). For fc > 2 
a lumping (P,g) has the single forward k-sequence property 
(short: SFS(fc)) iff 



VxG.g-Mj/],xe,g-My]\{x'}: 



0. (7) 



Thus, for every realization of F", the reaUzable preimage of 
Y2 is a singleton. Therefore, SFS(fc) implies not only that Y 
is fcth-order Markov, but also that the lumping is information- 
preserving^ |8., Prop. 10]. Note that SFS(fc) is a property of 
the combinatorial structure of the transition matrix P, i.e., it 
only depends on the location of its non-zero entries. 

The S PS (fc) -property has practical significance: Besides 
preserving, if possible, the information of the original model, 
those lumpings which possess the Markov property of any or- 
der are preferable from a computational perspective. Moreover, 
the corresponding conditions for the more desirable first-order 
Markov output, not necessarily information-preserving, are too 
restrictive in most scenarios (cf. llTO] Sec. 6.3]). 

The next result investigates a cascade of lumping functions. 
Let g :— ho J, where h: X ^ Z and f:Z^y. We identify 
a function with the partition it induces on X; thus, abusing 
notation, we say that /i is a refinement of g iff this holds for 
their induced partitions. 

Proposition 2 (SFS(fc) & Refinements). If a lumping {P,g) 
is SFS(fc), then so is (P, h), for all refinements h of g. 

Proof: See Appendix. ■ 

Clearly, a refinement does not increase the loss of infor- 
mation, so information-preservation is preserved under refine- 
ments. In contrast, a refinement of a lumping yielding a fcth- 
order Markov process Y need not possess that property; the 
lumping to a single state has the Markov property, while a 
refinement of it generally has not. 

All SFS(fc)-lumpings lie within the intersection of 
information-preserving lumpings and lumpings yielding a fcth- 
order Markov chain. However, the SFS(fc)-property does not 
exhaust this intersection fF, Fig. 2]. In |8| we presented 
sufficient conditions for a lumping to be either information- 
preserving or to yield a fcth-order Markov chain, respectively. 
We currently do not know if SFS(fc) is identical to the 
intersection of these two sufficient conditions.. 

IV. An Algorithm for SFS(2)-lumpings 

In this section, we present an algorithm listing all SFS(2)- 
lumpings, i.e., lumpings (P,g) yielding a second-order 
Markov chain and preserving full model information. 

SFS(2)-lumpings have the property that, for all yi,y2 G 
3^, from within a set g~''^[yi\ at most one element in the set 

'Actually, SFS(A:) implies more than H(X|Y) = 0: It implies that a se- 
quence of states of the reduced model uniquely detemiines the corresponding 
sequence of the original model, except for the first sample. Thus, the reduced 
model is in some sense "invertible". 



g ^[j/2] is accessible: 

'ixieg-^[yi],X2eg-^[y2]\{x'2}. P.^.., =0. (8) 

This gives rise to 
Proposition 3. An SfS{2)-lumping satisfies 



M > maxd, 



(9) 



Proof: We evaluate the rows of P separately. All states 
X2 accessible from state xi are characterized by Pxi,x2 > 0- 
Any two states accessible from xi cannot be merged, since 
this would contradict (|8j. Thus, all states accessible from xi 
must have different images, implying M > d^^- The result 
follows by considering all states xi. ■ 

In particular. Proposition |3] implies that a transition matrix 
with at least one positive row does not admit an SFS(2)- 
lumping. 

An algorithm listing all SFS(2)-lumpings, or SFS(2)- 
partitions, for a given transition matrix P has to check the 
SFS(2)-property for all partitions of X into at least max^ di 
non-empty sets. The number of these partitions can be cal- 
culated from the Stirling numbers of the second kind ifTTI 
Thm. 8.2.5] and is typically too large to allow an exhaustive 
search. Therefore, we use Proposition |2] to reduce the search 
space. 

Starting from the trivial partition with N elements, we 
evaluate all possible merges of two states, i.e., all possible 



partitions with A^ — 1 sets, of which there exist 



JV(JV-l) 



Out 



of these, we drop those from the list which do not possess the 
SFS(2)-property. The remaining set of admissible pairs is a 
central element of the algorithm. 

We proceed iteratively: To generate all candidate partitions 
with N — i sets, we perform all admissible pair-wise merges 
on all SFS(2)-partitions with N — i + I sets. An admissible 
pair-wise merge is a merge of two sets of a partition, where 
either set contains one element of the admissible pair From 
the resulting partitions one drops those violating SFS(2) before 
performing the next iteration. Since this algorithm generates 
some partitions multiple times (see the toy example in Sec- 
tion [Vj, in every iteration all duplicates are removed. The 
algorithm is presented in Table |I] 

Iterative generation of the partitions by admissible pair- wise 
merges allows application of Proposition |2] which reduces 
the number of partitions to be searched. If the number of 
admissible pairs is small compared to — 2~ ^ ^^^'^ ^^^^ 
reduction is significant. Inefficiencies in our algorithm caused 
by multiple considerations of the same partitions could be 
alleviated by adapting the classical algorithms for the partition 
generating problem lfT2l . |[T3|. 

The actual choice of one of the obtained SFS(2)-partitions 
for model order reduction requires additional model-specific 
considerations: A possible criterion could be maximum com- 
pression (i.e., smallest entropy of the marginal distribution). 
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TABLE I 

Algorithm for listing all SFS(2)-lumpings 

procedure ListLumpings(P) 

admPairs •(— GetAdmissiblePairs(P) 

Lumpings(l) <— merge(admPairs) > Convert pairs to functions 

n •*— 1 

while notEmpty(Lumpings(n)) do 
n •<— n + 1 
Lumpings(?i) <— [ ] 
for h € Lumpings(n — 1) do 
for {ii, 12} G admPairs do 
g ^ h 

g(h^^{h{i2))) •<— g{ii) o ii and 12 have same image, 
if g is SFS(2) then 

Lumpings(n) <— [Lumpings(n); g] 
end if 
end for 
end for 

Remove duplicates from Lumpings 
end while 
return Lumpings 
end procedure 



function GetAdmissiblePairs(P) 
Pairs -s— [ ] 
N <- dim(P) 
for Ji = 1 : Af - 1 do 
for 42 = ii : N do 
/ <~ merge(Ji,i2) 
if / is SFS(2) then 

Pairs +— [Pairs; {Ji, 12}] 
end if 
end for 
end for 
return Pairs 
end function 



> / merges Ji and 12 
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Fig. 1 . A transition graph on 6 veitices with a lumping given by the partition 
{{1, 2, 3}, {4, 5}, {6}}. The lumping is of type SFS(2). 



V. A Toy Example 



We illustrate our algorithm at the hand of a small example. 
Consider the six-state Markov chain with transition graph 



{1'2} 



{1,2,3} 



{1,2,3,5} 



,{1'3}. 



{1,5}^ 



-{2,3} 



={4,5} 




3}, {4, 5} 



{2, 3}, {1,4, 5} 



Fig. 2. An illustration of the algorithm of Table |I] at the hand of the example depicted in Fig. \T\ The first row shows all admissible pairs, the algorithm 
runs through all rows (top to bottom) by merging according to the admissible pairs (left to right). Bold, red arrows indicate newly generated partitions, gray 
arrows indicate that this partition was already found and is thus removed as a duplicate. Gray partitions violate the SFS(2)-property. This figure lists all 
SFS(2)-partitions of X (cf. Table Hill. 



depicted in Fig[T] whose adjacency matrix A is 



TABLE II 

List of SFS{2)-lumpings of the example found by the algorithm 



A 






1 








1 







1 








1 







1 








1 
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1 
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1 








10 10 1 



(10) 



Since all states have out-degree di — 3, lumpings to at 
least M = 3 states are considered. The lumping in Fig. [T] 
satisfies the SFS(2)-property. Fig. |2] shows the derivation of 
the lumping of Fig. [T] by our algorithm. 

During initialization we evaluate all 15 possible pair- wise 
merges. Of these, we exclude all pairs where both members 
are accessible from the same state, i.e., {2,5}, {2,6}, {5,6}, 
{3,4}, {3,6}, {4,6}, {1,4}, and {1,6}. Furthermore, {2,4} 
and {3, 5} are excluded too; the former because both states 
have self-loops, the latter because both states are connected in 
either direction. Only five pairs are admissible. 

One admissible pair is {1,2}, i.e., the function h 
merging {1,2} and, thus, inducing the partition Z5 = 
{{1,2}, {3}, {4}, {5}, {6}}, satisfies SFS(2). With this h we 
enter the algorithm in the innermost loop (Table |I] line |9). 
The algorithm performs pair-wise merges according to the 
five admissible pairs and obtains the following merges: {1, 2}, 
{1, 2, 3}, {1, 2, 5}, {{1, 2}, {4, 5}}; the first is a (trivial) dupH- 
cate (by performing a pair- wise merge according to {1, 2}) and 
the second is obtained twice (by pairing {1,2} with {1,3} and 
{2, 3}). Only {1, 2, 5} violates SFS(2). The functions merging 
{1, 2, 3} and {{1, 2}, {4, 5}} are added to the fist of lumping 
functions to four states, and the procedure is repeated for a 
different admissible pair 

For the next iteration, fix h such that it induces the partition 
Z4 = {{1,2, 3}, {4}, {5}, {6}}. The five admissible pairs 
yield the non-trivial merges {1,2,3}, a duplicate which is 
obtained three times, {1,2,3,5}, which violates SFS(2), and 
{{1, 2, 3}, {4, 5}}, which is the solution depicted in Fig. [T] 



M 


Partition Zm 


6 


{1}, {2}, {3}, {4}, {5}, {6} 


5 


{1,2}, {3}, {4}, {5}, {6} 
{1,3}, {2}, {4}, {5}, {6} 
{1,5}, {2}, {3}, {4}, {6} 
{1}, {2, 3}, {3}, {4}, {6} 
{1}, {2}, {3}, {4, 5}, {6} 


4 


{1,2, 3}, {4}, {5}, {6} 
{1,2}, {3}, {4, 5}, {6} 
{1,3}, {2}, {4, 5}, {6} 
{1}, {2, 3}, {4, 5}, {6} 


3 


{1,2, 3}, {4, 5}, {6} 



The algorithm terminates now, since every pair-wise merge of 
Z3 ^ y ^ {{1,2, 3}, {4, 5}, {6}} either violates SFS(2) or 
is a duplicate. The list of all SFS(2)-lumpings found by the 
algorithm is given in Table |II] 

VI. Clustering a bi-gram model 

We apply our algorithm to a bi-grarq^ letter model. Com- 
monly used in speech processing [|T] Ch. 6], n-grams (of which 
bi-grams are a special case) are (n—l)th-order Markov models 
for the occurrence of letters or words. From a set of training 
data the relative frequency of the (co-)occurrence of letters 
or words is determined, yielding the maximum likelihood 
estimate of thek (conditional) probabilities. In practice, for 
large n, even large training data cannot contain all possible 
sequences, so the n-gram model will contain a considerable 
amount of zero transition probabilities. Since this would lead 
to problems in, e.g., a speech recognition system, those entries 
are increased by a small constant to smooth the model, for 
example using Laplace's law |1, pp. 202]. 

Since by Proposition [3] an information-preserving lumping 
is more efficient for a sparse transition matrix, we refrain 
from smoothing and use the maximum likelihood estimates of 

-Shannon used bi-grams, or digrams as he called them, as a second-order 
approximation of the English language 1141 . 




10 15 20 25 30 35 40 



Fig. 3. The adjacency matrix of the bi-gram model of "The Great Gatsby". 
The first two states are fine break (LB) and space (' '), followed by 
punctuations. The block in the lower right comer indicates interactions of 
letters and punctuation following letters. 



the model parameters instead. We trained a Markov bi-gram 
letter model of F. Scott Fitzgerald's "The Great Gatsby", a 
text containing roughly 270000 letters. To reduce the alphabet 
size and, thus, the run-time of the algorithm, we replaced all 
numbers by '#' and all upper case by lower case letters. We 
left punctuations unchanged, yielding a total alphabet size of 
41. The adjacency matrix of the bi-gram model can be seen 
in Fig. (3) the maximum out-degree of the Markov chain is 37. 

Of the 820 possible merges only 21 are admissible. Fur- 
thermore, there are 129, 246, and 90 SFS(2)-lumpings to 
sets of cardinalities 39, 38, and 37, respectively. There are 
only two admissible triples, namely {LB, '$', 'x'} and {LB, 
'(', 'x'}, where LB denotes the line break. Of the more 
notable pair-wise merges we mention {'(',')'}, {' (','z'}, 
and the merges of ' # ' with colon, semicolon, and exclamation 
mark. Especially the first is intuitive, since parentheses can be 
exchanged to, e.g., ' | ' while preserving the meaning of the 
symbojj. 

We finally determined the lumping yielding maximum com- 
pression, i.e., the one for which H{Y) is minimized. This 
lumping, merging {LB, '$', 'x'}, {'!', '#'}, and {' (', ','}, 
decreases the entropy from H{X) — 4.3100 to H{Y) — 
4.3044. The entropies roughly correspond to the 4.03 bits 
derived for Shannon's first-order model, which contains only 
27 symbols |9, p. 170]. 

While the compression obtained with our algorithm seems 
negligible, we are confident that it improves for lumping n- 
grams with n > 2, since these appear to be even more sparse 
than the bi-gram model. For example, the transition chain of a 
tri-gram model trained with Fitzgerald's text has a maximum 
out-degree of 32, compared to 37 for the bi-gram model. What 

^Whether the symbol initiates or terminates a parenthetic expression is 
detemiined by whether the symbol is preceded or succeeded by a blank space. 
Unless parenthetic expressions are nested, simple counting distinguishes 
between initiation and termination. 



currently prevents testing these claims is the lack of an efficient 
implementation of our algorithm. 

Some merges which preserve information are not found by 
the algorithm: For example, since in the text every 'q' is 
followed by a ' u ' , and since no two ' u ' occur in a row, it would 
be possible to merge these two letters: No information is lost, 
since if the merged state is visited once and left immediately, 
only a 'u' is possible. Conversely, if the merged state occurs 
twice it can only be an occurrence of 'qu'. However, the model 
obtained by this merge does not possess the Markov property 
of any order, and thus violates SFS(2). 

VII. Conclusion 

We presented a sufficient condition for merging states of 
a Markov chain such that the resulting process is second- 
order Markov and has full model information. We furthermore 
developed an iterative algorithm finding all such merges for a 
given transition matrix. Finally, we presented a lower bound 
on the cardinality of the reduced state space depending on the 
maximum out-degree of the associated transition graph. 

The application of our algorithm to a bi-gram letter model 
suggests its practical relevance for model-order reduction, e.g., 
for n-grams with n > 2. Future work shall investigate possible 
improvements of the algorithm as well as its complexity 
analysis. 
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Appendix 
A. Proof of Proposition [7] 

We recall from lEl that ^(X|Y) = implies, for all n, 

^ 3!x G X""-^: 
Vy{XJ^-^ - x|Xi = X, Y^-^ = y, X„ - x) = 1 . (11) 

We thus obtain a bound on a realization i„ of the preimage 
count (i.e., for Y{^ — y) 



= Y. [Pr(^r=x)>0] 

xeg-i[y] 
= Y. [Pr(Xr - x|Yi" - y) > 0] 



xeA"" 

xeA"" 

X [Pr(Xr' = 4"' in" =y.Xi= :ri,X„ = x„) > O] 

^^ Y [Pr(^i - :ri , X„ = a;„ iFi" = y) > 0] 

xieg^'-lyi] 



< A^^ < oo 



and 



where (a) is due to ( fTTl i. Since this holds for all n and all 
realizations, this proves 

i7(X| Y) = ^ 3 C < oo: sup r„ < C (12) 

With (l3a] l. the reverse implication is trivial. ■ 

B. Proof of CowllaryUl 

The proof employs elementary results from graph theory: 
Let A denote the adjacency matrix of the Markov chain, i.e., 
^i.j = [Pi,j > 0]. The number of closed walks of length k on 
the graph determined by A is given as ifTSl p. 24] 



N 



(13) 



where {Xi}iLi is the set of eigenvalues of A. 

Let t'^ denote the number of sequences x G X^ of X with 
positive probability, i.e.. 



(14) 



<:eA"= 



Clearly, t\ > J2i=i ^i- Furthermore, defining ty similarily 
we obtain ty < M''. With Amax denoting the largest eigen- 
value of A, 



N 



4-k Y^« \fc 



fk 

by 



> 



Mk 



> 



Xr, 



M 



(15) 



If Ainax > M, then the ratio of possible length-A: sequences 
of X to those of Y increases exponentially. Then the pigeon- 
hole-principle implies that also the preimage count r„ is 
unbounded. Thus, 



iJ(X|Y) =0^ Af > A„ 



(16) 



Finally, the Perron-Frobenius theorem for non-negative matri- 
ces lfT6l Cor. 8.3.3] bounds the largest eigenvalue of A from 
below by the minimum out-degree of P. ■ 

C. Proof of Proposition [2] 

We prove the proposition by contradiction: Assume (P, h) 
violates SFS(fc). Then there exists a z G Z'^~^,z £ Z such 
that there exist two distinct x',x" £ ^ ^[z] and two, not 
necessarily distinct x',x" G h^^[z\ such that 



Vt{X^ 



x'|Z^' = z,Xi =a;') > 



Pr(X| = x"|Z^' = z, Xi = x") > . 



(18) 



(17) 



In other words, there are two different sequences x',x" 
accessible from either the same (x' ~ x") or from different 
ix' 7^ x") starting states. 

Now take y — /(z) and y = f{z). Since /i is a refinement 

of g, we have /i^^[z] C (7^^[y] and /i^^[z] C .g~^[y]. As a 
consequence, x',x" G g^^[y] and x\x" G g^^[y\, implying 
that (P,.g) violates SFS(fc). This proves 

(P, h) violates SFS(fc) ^ (P, g) violates SFS(fc) . (19) 

The negation of these statements completes the proof. ■ 
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